feat: add datacenter selection example and update volume examples#44
feat: add datacenter selection example and update volume examples#44
Conversation
There was a problem hiding this comment.
Pull request overview
Adds new documentation/examples demonstrating datacenter pinning for endpoints, and updates the network volume example to explicitly set datacenter on both the NetworkVolume and Endpoint now that the SDK default is no longer EU-RO-1.
Changes:
- Add a new
04_scaling_performance/02_datacenters/example (GPU single-DC + multi-DC, and CPU with CPU-supported DC restriction). - Update the
05_data_workflows/01_network_volumes/GPU/CPU workers to explicitly setdatacenterfor the shared volume and the endpoints.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| 05_data_workflows/01_network_volumes/gpu_worker.py | Pins the network volume and GPU endpoint to a specific datacenter. |
| 05_data_workflows/01_network_volumes/cpu_worker.py | Pins the shared network volume and CPU endpoint to the same datacenter as the GPU worker. |
| 04_scaling_performance/02_datacenters/gpu_worker.py | New example showing single-DC and multi-DC GPU endpoint pinning. |
| 04_scaling_performance/02_datacenters/cpu_worker.py | New example showing CPU endpoint pinning to a CPU-supported datacenter. |
| 04_scaling_performance/02_datacenters/README.md | New docs describing datacenter selection and the included examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| ## Quick Start | ||
|
|
||
| ```bash | ||
| pip install -r requirements.txt |
There was a problem hiding this comment.
The Quick Start instructs pip install -r requirements.txt, but this example directory does not include a requirements.txt. Update the instructions to reference the repo-level setup (as other examples do) or add the missing requirements file so the commands work as written.
| pip install -r requirements.txt | |
| pip install -r ../../requirements.txt |
| @@ -0,0 +1,29 @@ | |||
| # cpu worker pinned to a cpu-supported datacenter. | |||
| # cpu endpoints are only available in a subset of datacenters | |||
| # (see CPU_DATACENTERS). selecting an unsupported DC raises an error. | |||
There was a problem hiding this comment.
Comment has a sentence starting mid-line after a period; capitalize the first word for readability/grammar.
| # (see CPU_DATACENTERS). selecting an unsupported DC raises an error. | |
| # (see CPU_DATACENTERS). Selecting an unsupported DC raises an error. |
runpod-Henrik
left a comment
There was a problem hiding this comment.
1. Blocker: companion PR #266 must merge first
All new DataCenter values used in this PR (US_GA_1, EU_NL_1, etc.) don't exist in the current SDK — DataCenter on main has one member: EU_RO_1. The examples fail on import against flash v1.9.1:
DataCenter.US_GA_1 # AttributeError — doesn't exist yetPR #266 also adds the datacenter= alias on NetworkVolume and the list-accepting signature for Endpoint(datacenter=[...]). Without it:
NetworkVolume(datacenter=...)— silently dropped (no such field on the model;BaseResourceuses Pydantic v2's defaultextra='ignore')Endpoint(datacenter=[DataCenter.US_GA_1, DataCenter.EU_RO_1])— validation error; current type isDataCenter, notUnion[DataCenter, list[DataCenter]]
Hold this PR until #266 merges.
2. cpu_worker.py uses LB-style API; gpu_worker.py uses QB-style
Both workers demonstrate datacenter selection, but they use different API patterns: cpu_worker.py creates an api = Endpoint(...) instance and registers routes with @api.post()/@api.get(), while gpu_worker.py uses the @Endpoint(...) function decorator. A user reading the example to learn about datacenters ends up learning two different deployment patterns simultaneously, which blurs what the example is actually teaching.
If the LB-style is intentional for CPU workers in this section, the README should call it out. If not, it would be cleaner to use QB-style for both:
@Endpoint(name="04_02_cpu_eu", cpu="cpu3c-2-4", workers=(0, 2), datacenter=DataCenter.EU_RO_1)
async def process(data: dict) -> dict:
return {"datacenter": "EU-RO-1", "result": data}3. Multi-DC response is missing the datacenter indicator
The single-DC endpoint returns {"datacenter": "US-GA-1", "result": payload}, making it easy to verify which DC handled the request. The multi-DC endpoint only returns {"result": payload} — you can't tell which DC ran the job. Worth including for the example to be demonstrable:
# multi-DC endpoint
return {"datacenters": ["US-GA-1", "EU-RO-1"], "result": payload}Or note in the README how to verify DC placement another way.
4. Network volume update
Good change — explicitly setting datacenter=DataCenter.EU_RO_1 on both NetworkVolume and Endpoint is the right fix now that the default changes to None. The inline comment # same volume as gpu_worker.py -- must match name and datacenter is helpful.
Nits
README.mdQuick Start sayspip install -r requirements.txtbut norequirements.txtis included in the diff. If there's a shared one at a parent level, the path should be explicit.- The hardcoded string
"datacenter": "EU-RO-1"in thecpu_worker.pyresponse will drift if the DC is changed. Could useDataCenter.EU_RO_1.valueinstead.
Verdict
Hold until PR #266 merges. The examples are well-structured and the datacenter coverage (single DC, multi-DC, CPU restriction) is the right scope for this section. The LB vs QB inconsistency is worth addressing before merge, but isn't a blocker on its own.
The flash SDK no longer hardcodes EU-RO-1 as the only datacenter.
Endpoint(datacenter=...)accepts a single DC, a list, orNone(all DCs). Network volumes takedataCenterIdto specify which DC they live in.Adds
04_scaling_performance/02_datacenters/with a GPU worker showing single-DC and multi-DC pinning, and a CPU worker showing how CPU DC restrictions work.Updates the network volume example (
05_data_workflows/01_network_volumes/) to explicitly setdatacenteron the volumes anddatacenteron the endpoints, since the default is nowNoneinstead ofEU-RO-1.Companion PR: runpod/flash#266